14 research outputs found

    A group sparsity-driven approach to 3-D action recognition

    Get PDF
    In this paper, a novel 3-D action recognition method based on sparse representation is presented. Silhouette images from multiple cameras are combined to obtain motion history volumes (MHVs). Cylindrical Fourier transform of MHVs is used as action descriptors. We assume that a test sample has a sparse representation in the space of training samples. We cast the action classification problem as an optimization problem and classify actions using group sparsity based on l1 regularization. We show experimental results using the IXMAS multi-view database and demonstratethe superiority of our method, especially when observations are low resolution, occluded, and noisy and when the feature dimension is reduced

    A graphical model based solution to the facial feature point tracking problem

    Get PDF
    In this paper a facial feature point tracker that is motivated by applications such as human-computer interfaces and facial expression analysis systems is proposed. The proposed tracker is based on a graphical model framework. The facial features are tracked through video streams by incorporating statistical relations in time as well as spatial relations between feature points. By exploiting the spatial relationships between feature points, the proposed method provides robustness in real-world conditions such as arbitrary head movements and occlusions. A Gabor feature-based occlusion detector is developed and used to handle occlusions. The performance of the proposed tracker has been evaluated on real video data under various conditions including occluded facial gestures and head movements. It is also compared to two popular methods, one based on Kalman filtering exploiting temporal relations, and the other based on active appearance models (AAM). Improvements provided by the proposed approach are demonstrated through both visual displays and quantitative analysis

    A sparsity-driven approach to multi-camera tracking in visual sensor networks

    Get PDF
    In this paper, a sparsity-driven approach is presented for multi-camera tracking in visual sensor networks (VSNs). VSNs consist of image sensors, embedded processors and wireless transceivers which are powered by batteries. Since the energy and bandwidth resources are limited, setting up a tracking system in VSNs is a challenging problem. Motivated by the goal of tracking in a bandwidth-constrained environment, we present a sparsity-driven method to compress the features extracted by the camera nodes, which are then transmitted across the network for distributed inference. We have designed special overcomplete dictionaries that match the structure of the features, leading to very parsimonious yet accurate representations. We have tested our method in indoor and outdoor people tracking scenarios. Our experimental results demonstrate how our approach leads to communication savings without significant loss in tracking performance

    Graphical model based facial feature point tracking in a vehicle environment

    Get PDF
    Facial feature point tracking is a research area that can be used in human-computer interaction (HCI), facial expression analysis, fatigue detection, etc. In this paper, a statistical method for facial feature point tracking is proposed. Feature point tracking is a challenging topic in case of uncertain data because of noise and/or occlusions. With this motivation, a graphical model that incorporates not only temporal information about feature point movements, but also information about the spatial relationships between such points is built. Based on this model, an algorithm that achieves feature point tracking through a video observation sequence is implemented. The proposed method is applied on 2D gray scale real video sequences taken in a vehicle environment and the superiority of this approach over existing techniques is demonstrated

    Feature compression: a framework for multi-view multi-person tracking in visual sensor networks

    Get PDF
    Visual sensor networks (VSNs) consist of image sensors, embedded processors and wireless transceivers which are powered by batteries. Since the energy and bandwidth resources are limited, setting up a tracking system in VSNs is a challenging problem. In this paper, we present a framework for human tracking in VSNs. The traditional approach of sending compressed images to a central node has certain disadvantages such as decreasing the performance of further processing (i.e., tracking) because of low quality images. Instead, we propose a feature compression-based decentralized tracking framework that is better matched with the further inference goal of tracking. In our method, each camera performs feature extraction and obtains likelihood functions. By transforming to an appropriate domain and taking only the significant coefficients, these likelihood functions are compressed and this new representation is sent to the fusion node. As a result, this allows us to reduce the communication in the network without significantly affecting the tracking performance. An appropriate domain is selected by performing a comparison between well-known transforms. We have applied our method for indoor people tracking and demonstrated the superiority of our system over the traditional approach and a decentralized approach that uses Kalman filter

    Eye feature point tracking by using graphical models

    Get PDF
    In this paper, a statistical method for eye feature point tracking is proposed. The aim is to track feature points even when the observed data are uncertain because of noise and/or occlusion. With this motivation, a graphical model that uses the spatial information as well as the temporal information between points is built. The proposed method is applied on 2D grayscale real video sequences as a real data application

    Sparse representation frameworks for inference problems in visual sensor networks

    Get PDF
    Visual sensor networks (VSNs) form a new research area that merges computer vision and sensor networks. VSNs consist of small visual sensor nodes called camera nodes, which integrate an image sensor, an embedded processor, and a wireless transceiver. Having multiple cameras in a wireless network poses unique and challenging problems that do not exist either in computer vision or in sensor networks. Due to the resource constraints of the camera nodes, such as battery power and bandwidth, it is crucial to perform data processing and collaboration efficiently. This thesis presents a number of sparse-representation based methods to be used in the context of surveillance tasks in VSNs. Performing surveillance tasks, such as tracking, recognition, etc., in a communication-constrained VSN environment is extremely challenging. Compressed sensing is a technique for acquiring and reconstructing a signal from small amount of measurements utilizing the prior knowledge that the signal has a sparse representation in a proper space. The ability of sparse representation tools to reconstruct signals from small amount of observations fits well with the limitations in VSNs for processing, communication, and collaboration. Hence, this thesis presents novel sparsity-driven methods that can be used in action recognition and human tracking applications in VSNs. A sparsity-driven action recognition method is proposed by casting the classification problem as an optimization problem. We solve the optimization problem by enforcing sparsity through ł1 regularization and perform action recognition. We have demonstrated the superiority of our method when observations are low-resolution, occluded, and noisy. To the best of our knowledge, this is the first action recognition method that uses sparse representation. In addition, we have proposed an adaptation of this method for VSN resource constraints. We have also performed an analysis of the role of sparsity in classi cation for two different action recognition problems. We have proposed a feature compression framework for human tracking applications in visual sensor networks. In this framework, we perform decentralized tracking: each camera extracts useful features from the images it has observed and sends them to a fusion node which collects the multi-view image features and performs tracking. In tracking, extracting features usually results a likelihood function. To reduce communication in the network, we compress the likelihoods by first splitting them into blocks, and then transforming each block to a proper domain and taking only the most significant coefficients in this representation. To the best of our knowledge, compression of features computed in the context of tracking in a VSN has not been proposed in previous works. We have applied our method for indoor and outdoor tracking scenarios. Experimental results show that our approach can save up to 99.6% of the bandwidth compared to centralized approaches that compress raw images to decrease the communication. We have also shown that our approach outperforms existing decentralized approaches. Furthermore, we have extended this tracking framework and proposed a sparsitydriven approach for human tracking in VSNs. We have designed special overcomplete dictionaries that exploit the specific known geometry of the measurement scenario and used these dictionaries for sparse representation of likelihoods. By obtaining dictionaries that match the structure of the likelihood functions, we can represent likelihoods with few coefficients, and thereby decrease the communication in the network. This is the first method in the literature that uses sparse representation to compress likelihood functions and applies this idea for VSNs. We have tested our approach for indoor and outdoor tracking scenarios and demonstrated that our approach can achieve bandwidth reduction better than our feature compression framework. We have also presented that our approach outperforms existing decentralized and distributed approaches

    Facial feature point tracking based on a graphical model framework

    Get PDF
    In this thesis a facial feature point tracker that can be used in applications such as human-computer interfaces, facial expression analysis systems, driver fatigue detection systems, etc. is proposed. The proposed tracker is based on a graphical model framework. The position of the facial features are tracked through video streams by incorporating statistical relations in time and the spatial relations between feature points. In many application areas, including those mentioned above, tracking is a key intermediate step that has a significant effect on the overall system performance. For this reason, a good practical tracking algorithm should take into account real-world phenomena such as arbitrary head movements and occlusions. Many existing algorithms track each feature point independently, and do not properly handle occlusions. This causes drifts in the case of arbitrary head movements and occlusions. By exploiting the spatial relationships between feature points, the proposed method provides robustness in a number of scenarios, including e.g. various head movements. To prevent drifts because of occlusions, a Gabor feature based occlusion detector is developed and used in the proposed method. The performance of the proposed tracker has been evaluated on real video data under various conditions. These conditions include occluded facial gestures, low video resolution, illumination changes in the scene, in-plane head motion, and out-of-plane head motion. The proposed method has also been tested on videos recorded in a vehicle environment, in order to evaluate its performance in a practical setting. Given these results it can be concluded that the proposed method provides a general promising framework for facial feature tracking. It is a robust tracker for facial expression sequences in which there are occlusions and arbitrary head movements. The results in the vehicle environment suggest that the proposed method has the potential to be useful for tasks such as driver behavior analysis or driver fatigue detection
    corecore